Search CORE

149 research outputs found

Bayesian optimization of the PC algorithm for learning Gaussian Bayesian networks

Author: B Malone
B Shahriari
C Bielza
CE Rasmussen
D Colombo
I Tsamardinos
J Hausser
M Kalisch
M Scutari
RO Ness
SL Lauritzen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 28/06/2018
Field of study

The PC algorithm is a popular method for learning the structure of Gaussian Bayesian networks. It carries out statistical tests to determine absent edges in the network. It is hence governed by two parameters: (i) The type of test, and (ii) its significance level. These parameters are usually set to values recommended by an expert. Nevertheless, such an approach can suffer from human bias, leading to suboptimal reconstruction results. In this paper we consider a more principled approach for choosing these parameters in an automatic way. For this we optimize a reconstruction score evaluated on a set of different Gaussian Bayesian networks. This objective is expensive to evaluate and lacks a closed-form expression, which means that Bayesian optimization (BO) is a natural choice. BO methods use a model to guide the search and are hence able to exploit smoothness properties of the objective surface. We show that the parameters found by a BO method outperform those found by a random search strategy and the expert recommendation. Importantly, we have found that an often overlooked statistical test provides the best over-all reconstruction results

arXiv.org e-Print Archive

Crossref

Challenges in the Multivariate Analysis of Mass Cytometry Data: The Effect of Randomization

Author: Cabrero D-G
Lagani V
Papoutsoglou G
Schmidt A
Tegner J
Tsamardinos I
Tsirlis K
Publication venue: 'Royal College of Obstetricians & Gynaecologists (RCOG)'
Publication date: 06/11/2019
Field of study

Cytometry by time-of-flight (CyTOF) has emerged as a high-throughput single cell technology able to provide large samples of protein readouts. Already, there exists a large pool of advanced high-dimensional analysis algorithms that explore the observed heterogeneous distributions making intriguing biological inferences. A fact largely overlooked by these methods, however, is the effect of the established data preprocessing pipeline to the distributions of the measured quantities. In this article, we focus on randomization, a transformation used for improving data visualization, which can negatively affect multivariate data analysis methods such as dimensionality reduction, clustering, and network reconstruction algorithms. Our results indicate that randomization should be used only for visualization purposes, but not in conjunction with high-dimensional analytical tools

UCL Discovery

Feature selection and prediction with a Markov blanket structure learning algorithm

Author: B Malone
I Tsamardinos
JR Quinlan
N Friedman
Yuan Tan
Z Liu
Zhifa Liu
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

An Experimental Comparison of Hybrid Algorithms for Bayesian Network Structure Learning

Author: A. Aussem
B. Ellis
C.F. Aliferis
D. Heckerman
D.M. Chickering
E. Perrier
G.E. Schwarz
I. Tsamardinos
I. Tsamardinos
J. Cheng
J. Pearl
J. Peña
J.M. Peña
J.M. Peña
K. Kojima
M. Koivisto
M. Scutari
S. Rodrigues de Morais
S.R. Morais de
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

International audienceWe present a novel hybrid algorithm for Bayesian network structure learning, called Hybrid HPC (H2PC). It first reconstructs the skeleton of a Bayesian network and then performs a Bayesian-scoring greedy hill-climbing search to orient the edges. It is based on a subroutine called HPC, that combines ideas from incremental and divide-and-conquer constraint-based methods to learn the parents and children of a target variable. We conduct an experimental comparison of H2PC against Max-Min Hill-Climbing (MMHC), which is currently the most powerful state-of-the-art algorithm for Bayesian network structure learning, on several benchmarks with various data sizes. Our extensive experiments show that H2PC outperforms MMHC both in terms of goodness of fit to new data and in terms of the quality of the network structure itself, which is closer to the true dependence structure of the data. The source code (in R) of H2PC as well as all data sets used for the empirical tests are publicly available

Crossref

HAL

Hal-Diderot

Optimizing monitorability of multi-cloud applications

Author: A Sheth
AN Toosi
C Zeginis
F Liu
G Aceto
I Tsamardinos
M Vitali
P Melià
R Kazhamiakin
T Kaur
W Funika
Y Gao
Í Goiri
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

When adopting a multi-cloud strategy, the selection of cloud providers where to deploy VMs is a crucial task for ensuring a good behaviour for the developed application. This selection is usually focused on the general information about performances and capabilities offered by the cloud providers. Less attention has been paid to the monitoring services although, for the application developer, is fundamental to understand how the application behaves while it is running. In this paper we propose an approach based on a multi-objective mixed integer linear optimization problem for supporting the selection of the cloud providers able to satisfy constraints on monitoring dimensions associated to VMs. The balance between the quality of data monitored and the cost for obtaining these data is considered, as well as the possibility for the cloud provider to enrich the set of monitored metrics through data analysis

Archivio istituzionale della ricerca - Politecnico di Milano

Crossref

A Markov blanket-based method for detecting causal SNPs in GWAS

Author: A Hamosh
BA McKinney
Bing Han
C Kooperberg
C-c Chang
CF Aliferis
D Koller
D Margaritis
DF Easton
HJ Cordell
I Tsamardinos
I Tsamardinos
J Fellay
J Li
J Marchini
JH McDonald
JH Moore
JK Pritchard
LW Hahn
M Robnik-Šikonja
MD Ritchie
MD Ritchie
MD Shriver
Meeyoung Park
MY Park
P Spirtes
R Jiang
RJ Klein
RR Sokal
SE Antonarakis
SH Chen
SK Musani
ST Sherry
X-W Chen
Xue-wen Chen
Y Zhang
Publication venue: BioMed Central
Publication date: 01/04/2010
Field of study

Abstract Background Detecting epistatic interactions associated with complex and common diseases can help to improve prevention, diagnosis and treatment of these diseases. With the development of genome-wide association studies (GWAS), designing powerful and robust computational method for identifying epistatic interactions associated with common diseases becomes a great challenge to bioinformatics society, because the study of epistatic interactions often deals with the large size of the genotyped data and the huge amount of combinations of all the possible genetic factors. Most existing computational detection methods are based on the classification capacity of SNP sets, which may fail to identify SNP sets that are strongly associated with the diseases and introduce a lot of false positives. In addition, most methods are not suitable for genome-wide scale studies due to their computational complexity. Results We propose a new Markov Blanket-based method, DASSO-MB (Detection of ASSOciations using Markov Blanket) to detect epistatic interactions in case-control GWAS. Markov blanket of a target variable T can completely shield T from all other variables. Thus, we can guarantee that the SNP set detected by DASSO-MB has a strong association with diseases and contains fewest false positives. Furthermore, DASSO-MB uses a heuristic search strategy by calculating the association between variables to avoid the time-consuming training process as in other machine-learning methods. We apply our algorithm to simulated datasets and a real case-control dataset. We compare DASSO-MB to other commonly-used methods and show that our method significantly outperforms other methods and is capable of finding SNPs strongly associated with diseases. Conclusions Our study shows that DASSO-MB can identify a minimal set of causal SNPs associated with diseases, which contains less false positives compared to other existing methods. Given the huge size of genomic dataset produced by GWAS, this is critical in saving the potential costs of biological experiments and being an efficient guideline for pathogenesis research.</p

Crossref

Directory of Open Access Journals

KU ScholarWorks

PubMed Central

Hybrid Correlation and Causal Feature Selection for Ensemble Classifiers

Author: D. Margaritis
E. Bauer
F. Liu
F. Liu
H. Almuallim
H. Liu
H. Zhang
I. Guyon
I. Tsamardinos
I. Tsamardinos
I.H. Witten
J. Cheng
L. Breiman
L. Yu
M. Kudo
M. Wang
M.A. Hall
N. Friedman
P. Pudil
P. Spirtes
R. Duangsoithong
T. Windeatt
T. Windeatt
Y. Saeys
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

PC and TPDA algorithms are robust and well known prototype algorithms, incorporating constraint-based approaches for causal discovery. However, both algorithms cannot scale up to deal with high dimensional data, that is more than few hundred features. This chapter presents hybrid correlation and causal feature selection for ensemble classifiers to deal with this problem. Redundant features are removed by correlation-based feature selection and then irrelevant features are eliminated by causal feature selection. The number of eliminated features, accuracy, the area under the receiver operating characteristic curve (AUC) and false negative rate (FNR) of proposed algorithms are compared with correlation-based feature selection (FCBF and CFS) and causal based feature selection algorithms (PC, TPDA, GS, IAMB)

Crossref

University of Surrey

Surrey Research Insight

Learning biological network using mutual information and conditional independence

Author: C Chow
Chin-Rang Yang
D Heckerman
DM Chickering
Dong-Chul Kim
GF Cooper
I Tsamardinos
IA Beinlich
J Pearl
J Rissanen
Jean Gao
LM de Campos
N Friedman
SL Lauritzen
T Verma
W Lam
Xiaoyu Wang
XW Chen
XW Chen
YB Kim
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Applications of Machine Learning in Human Microbiome Studies: A Review on Feature Selection, Biomarker Identification, Disease Prediction and Treatment

Author: Aasmets O
Berland M
Carrillo de, Santa, Pau, E
Claesson MJ
Gruca A
Hasic J
Hron K
Karaduzovic-Hadziabdic K
Klammsteiner T
Kolev M
Lahti L
Loncar Turukalo, T
Lopes MB
Marcos-Zambrano LJ
Moreno V
Moreno-Indias I
Naskinova I
Org E
Paciência I
Papoutsoglou G
Przymus P
Shigdel R
Stres B
Trajkovik V
Truu J
Tsamardinos I
Vilne B
Yousef M
Zdravevski E
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2021
Field of study

The number of microbiome-related studies has notably increased the availability of data on human microbiome composition and function. These studies provide the essential material to deeply explore host-microbiome associations and their relation to the development and progression of various complex diseases. Improved data-analytical tools are needed to exploit all information from these biological datasets, taking into account the peculiarities of microbiome data, i.e., compositional, heterogeneous and sparse nature of these datasets. The possibility of predicting host-phenotypes based on taxonomy-informed feature selection to establish an association between microbiome and predict disease states is beneficial for personalized medicine. In this regard, machine learning (ML) provides new insights into the development of models that can be used to predict outputs, such as classification and prediction in microbiology, infer host phenotypes to predict diseases and use microbial communities to stratify patients by their characterization of state-specific microbial signatures. Here we review the state-of-the-art ML methods and respective software applied in human microbiome studies, performed as part of the COST Action ML4Microbiome activities. This scoping review focuses on the application of ML in microbiome studies related to association and clinical use for diagnostics, prognostics, and therapeutics. Although the data presented here is more related to the bacterial community, many algorithms could be applied in general, regardless of the feature type. This literature and software review covering this broad topic is aligned with the scoping review methodology. The manual identification of data sources has been complemented with: (1) automated publication search through digital libraries of the three major publishers using natural language processing (NLP) Toolkit, and (2) an automated identification of relevant software repositories on GitHub and ranking of the related research papers relying on learning to rank approach.This study was supported by COST Action CA18131 “Statistical and machine learning techniques in human microbiome studies”. Estonian Research Council grant PRG548 (JT). Spanish State Research Agency Juan de la Cierva Grant IJC2019-042188-I (LM-Z). EO was founded and OA was supported by Estonian Research Council grant PUT 1371 and EMBO Installation grant 3573. AG was supported by Statutory Research project of the Department of Computer Networks and Systems

Repositório Aberto da Universidade do Porto

XPF interacts with TOP2B for R-loop processing and DNA looping on actively transcribed genes

Author: Akalestou-Clocher A.
Altmüller J.
Austin C.
Bouwman B.A.M.
Chatzinikolaou G.
Crosetto N.
Garinis G.A.
Goulielmaki E.
Siametis A.
Stratigi K.
Topalis P.
Tsamardinos I.
Publication venue: American Association for the Advancement of Science
Publication date: 10/11/2023
Field of study

Co-transcriptional RNA-DNA hybrids can not only cause DNA damage threatening genome integrity but also regulate gene activity in a mechanism that remains unclear. Here, we show that the nucleotide excision repair factor XPF interacts with the insulator binding protein CTCF and the cohesin subunits SMC1A and SMC3, leading to R-loop-dependent DNA looping upon transcription activation. To facilitate R-loop processing, XPF interacts and recruits with TOP2B on active gene promoters, leading to double-strand break accumulation and the activation of a DNA damage response. Abrogation of TOP2B leads to the diminished recruitment of XPF, CTCF, and the cohesin subunits to promoters of actively transcribed genes and R-loops and the concurrent impairment of CTCF-mediated DNA looping. Together, our findings disclose an essential role for XPF with TOP2B and the CTCF/cohesin complex in R-loop processing for transcription activation with important ramifications for DNA repair-deficient syndromes associated with transcription-associated DNA damage

MDC Repository